Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
نویسندگان
چکیده
Data prefetching via helper threading has been extensively investigated on Simultaneous MultiThreading (SMT) or Virtual Multi-Threading (VMT) architectures. Although reportedly large cache latency can be hidden by helper threads at runtime, most techniques rely on hardware support to reduce context switch overhead between the main thread and helper thread as well as rely on static profile feedback to construct the help thread code. This paper develops a new solution by exploiting helper threaded prefetching through dynamic optimization on the latest UltraSPARC Chip-Multiprocessing (CMP) processor. Our experiments show that by utilizing the otherwise idle processor core, a single user-level helper thread is sufficient to improve the runtime performance of the main thread without triggering multiple thread slices. Moreover, since the multiple cores are physically decoupled in the CMP, contention introduced by helper threading is minimal. This paper also discusses several key technical challenges of building a lightweight dynamic optimization/software scouting system on the UltraSPARC/Solaris platform.
منابع مشابه
1 CMP / CMT Scaling of SPECjbb 2005 on UltraSPARC T
The UltraSPARC T1 (Niagara) from Sun Microsystems is a new multi-threaded processor that combines Chip Multiprocessing (CMP) and Simultaneous Multi-threading (SMT) with an efficient instruction pipeline so as to enable Chip Multithreading (CMT). Its design is based on the decision not to focus the performance of single or dual threads, but rather to optimize for multithreaded performance in a c...
متن کاملView-Oriented Parallel Programming on CMT processors
View-Oriented Parallel Programming (VOPP) is a novel parallel programming model which uses views for communication between multiple processes. With the introduction of views, mutual exclusion and shared data access are bundled together, which offers both convenience and high performance to parallel programming. This paper presents the performance results of VOPP on Chip-Multithreading processor...
متن کاملJoint Exploration of Hardware Prefetching and Bandwidth Partitioning in Chip Multiprocessors
In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact Chip Multi-Processors (CMP) system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into ...
متن کاملA Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors
This paper proposes a hybrid hardware/software generated prefetching thread mechanism on Chip Multiprocessors(CMP). Two kinds of prefetching threads appear in our hybrid mechanism. Most threads belong to Dynamic Prefetching Thread, which are automatically generated, triggered, spawn and managed by hardware; The others are of Static Prefetching Thread, targeting at the critical delinquent loads ...
متن کاملDetection of Function- level Parallelism
While the chipmultiprocessor (CMP) has quickly become the predominant processor architecture, its continuing success largely depends on the parallelizability of complex programs. We present a framework that is able to extract coarse-grain function-level parallelism that can exploit the parallel resources of the CMP. The framework uses a profile-driven control and data dependence analysis betwee...
متن کامل